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1 Abstract 

We consider a statistical model for pairs of traded assets, based on a Cointegrated Vector Auto Regression 
(CVAR) Model. We extend standard CVAR models to incorporate estimation of model parameters in 
the presence of price series level shifts which are not accurately modeled in the standard Gaussian 
error correction model (ECM) framework. This involves developing a novel matrix variate Bayesian 
CVAR mixture model comprised of Gaussian errors intra-day and a-stable errors inter-day in the ECM 
framework. To achieve this we derive a novel conjugate posterior model for the Scaled Mixtures of 
Normals (SMiN CVAR) representation of a-stable inter-day innovations. These results are generalized to 
asymmetric models for the innovation noise at inter-day boundaries allowing for skewed a-stable models. 

Our proposed model and sampling methodology is general, incorporating the current literature on 
Gaussian models as a special subclass and also allowing for price series level shifts either at random esti- 
mated time points or known a priori time points. We focus analysis on regularly observed non-Gaussian 
level shifts that can have significant effect on estimation performance in statistical models failing to ac- 
count for such level shifts, such as at the close and open of markets. Wc compare the estimation accuracy 
of our model and estimation approach to standard frequentist and Bayesian procedures for CVAR models 
when non-Gaussian price series level shifts are present in the individual series, such as inter-day bound- 
aries. We fit a bi-variate a-stable model to the inter-day jumps and model the effect of such jumps on 
estimation of matrix- variate CVAR model parameters using the likelihood based Johansen procedure and 
a Bayesian estimation. We illustrate our model and the corresponding estimation procedures we develop 
on both synthetic and actual data. 

Keywords: Cointegrated Vector Autoregression, a-stable, Approximate Bayesian Computation. 
Journal of Econometrics 



2 Introduction 



In this paper we consider accurate estim ation of statistical mode ls for 



significant since recent empirical studies by 



Bock and Mestel [200911 and 



.1 



pairs trading strat egies. This is 



00^ have shown, 



that in spite of the increasing volume of statistical arbitrage quantitative funds performing algorithmic 
trading, statistical pair trading still seems to be consistently assessed as a profitable trading strategy, 
providing motivation to further develop such models. 

We focus on cointegrated v ector autoregression (CVAR) models wh ich have been studied widely in 



the econometric literature, see 



Engle and Granger [1987 1. Sugita [20091 . For the err or correction repre 



Granger and Weiss [200 lj and the overview of 



Koop et al. [2 0061. 



sentation of a co-integrated series, see 

Bayesian analy s is of CVAR models has been add re ssed in several papers, seelBauwens and Lubrano [199411 , 



Geweke [1996j . Kleibergen and Van Diik [200 



9]- 



Acker t and Racine [19991 and 



ooj In prac 



tice, Bayesian and n on-Bayesian CVAR models are used extensively in pairs trading, see an example in 
Peters et al. [2010a{ . We demonstrate that when estimating even the basic CVAR models using data 
which is sampled at a frequency less than one day, on real price series pairs, the accuracy and robustness 
of the statistical model fit and estimation and therefore the stability of the selected portfolio weights, is 
strongly affected by level shifts or jumps in price series due to inter-day movements. This is evident in 
settings in which the cointcgration rank is assumed known a nd so would b e compounded in settings in 
which uncertainty in the rank is also assumed, see analysis in lSugita [2009j |. 

Level shifts in each price series are due to complicated economic and social market factors, wc do 
not attempt to explain these with an economic rationale in this paper. Instead we demonstrate firstly 
that they occur regularly at the open and close of markets between joint trading times for pairs and 
secondly that statistical inference based on data that fails to appropriately account for these level shifts 
in a co-integration framework will result in poor model calibrations. We then develop and demonstrate 
a robust statistical approach to overcome this practically important estimation problem. 

Typically one observes level shifts in the price series occurring as a result of the time delay between 
the open and close of markets for each asset in the traded pair. However, the level of the price shifts 
can not solely be accounted for by the evolution of the statistical model during the time period in which 
cither market is closed. Some asset pairs may only have short periods of overlap in which each market 
is open and therefore the joint assets can be traded, it is particularly important to accurately model the 
inter-day level shifts for such pairs. We demonstrate that one can not ignore this practical issue of price 
series level shifts as it can result in significant sensitivity in the estimated model parameters. This in turn 
has consequences for trading resulting from the knowledge of the cointegration deviation series, which is 
affected and therefore results in carry on effects for design of trading thresholds. 

We begin by studying the statistical properties of these inter-day level shifts in the differenced 
price series for several pairs of assets over multiple contract segments spanning several years. Each 
pair is chosen as they demonstrate historically statistically significant cointcgration properties. W e 
model the level shifts in each price series via the flexible class of a-stable models, see Izolotarev [198^ . 



2 



Samorodnitskv and Taggu [1994j , iQiou and Ravishanker [1998| and iNolan [1997j . This class of models 



is of particular interest as they are flexible in terms of skew and kurtosis, whilst also admitting Gaussian 
distributions as a family member. That is, we fit the a-stable models to the price level shift, obtained 
between the open and close of the time when both markets are trading. We demonstrate that in most 
cases the assumption of Gaussian residuals for these time periods, implied by fitting the basic CVAR 
model is inadequate. In particular several assets demonstrate significantly heavy tailed distributions are 
appropriate for capturing the inter-day price deviations resulting from these level shifts. Therefore this 
contradicts the typical statistical assumption, of constant homoskedastic multi-variate Gaussian inno- 
vation noise, made when fitting the basic CVAR models that are widely utilized when trading pairs or 
assets. As a consequence we propose a new CVAR model and Bayesian estimation framework to incor- 
porate the potential for a a-stable innovation noise at these particular known, deterministic time points. 
Thereby reducing the sensitivity of the estimated CVAR model parameters to the period in which both 
asset markets are not active. This can trivially be extended to include stochastic time points in a change 
point or switching structure. 

Chen and Hsiao [201p| which develops a cointcgration model for pairs 



This differs from the work of 



of assets in which only symmetric a-stable innovations are utilized at all trading time points, with a 
fixed tail index parameter a throughout the time series. We argue that this is an overly restrictive model 
simplification when used for trading purposes and in addition their approach can not be easily generalized 
to a Bayesian estimation framework, in which we focus our statistical estim ation methodology. Their 
approach generalizes the Johannsen procedure Johansen and Juselius [199o| to the symmetric a-stable 
innovation setting for testing the rank of the cointcgrated VAR model. Wc will demonstrate a more flexible 
model removing the symmetry assumption for the stable noise, introducing a more realistic mixture noise 
model and providing a novel Approximate Bayesian Co mputation (ABC) sa mpling methodology for 



Peters et al. [2010a |. 



estimation and rank selection, generalizing the approach of . 

Estimation of the matrix variate parameters of a CVAR model under either a Johansen based 
likelihood-procedure or a Bayesian modeling approach will be demonstrated, on both synthetic and 
actual data, to be adversely affected by level shifts in the price series occurring at the close and open 
of markets. This will typically be reflected in large changes in the estimated CVAR model parameters, 
especially the constant mean level, the cointcgration vectors and noise covariance matrix. In such situa- 
tions, trading systems utilizing such parameter estimates will therefore also be sensitive to the changes 
in parameter estimates arising from the level shifts at day break boundaries. In high- frequency settings, 
where estimations are performed anywhere between several seconds to 20 - 30 min intervals, simply dis- 
carding the time periods during which level shifts occur can result in significant loss of trading activity. 
This is especially the case when trading activity is occuring around close and open times of markets. In 
addition, when modeling in the setting in which level shifts can occur randomly throughout the trading 
day, discarding these time periods is not suitable. Therefore, from the perspective of estimation failing 
to incorporate these level shifts in the price series can significantly affect parameter estimation in key 
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quantities such as the co-integration vectors. If this issue is not addressed, this could result in regular 
changes to portfolio allocations, resulting in additional transaction costs and other complications related 
to trade volumes. Therefore, in this paper we postulate that the underlying CVAR model will be a suit- 
able model for the underlying price scries in which the parameter estimation can be made less sensitive 
through appropriately modeling the price level shifts in the intra-day prices at open and close of markets. 



2.1 Contribution and Structure 

The novelty of this paper involves three parts: first we develop a new matrix variate distributional model 
for Baycsian co-integration incorporating a mixture of matrix variate and matrix a-stablc observation 
errors under an error correction model (ECM) framework; the second aspect of novelty is to develop a 
scaled mixture of normals conjugate family of matrix variate Bayesian models for the estimation of the 
matrix parameters in the newly proposed model; the third aspect involves taking the non-symmetric ma- 
trix variate a-stable setting and developing a sampling procedure for this intractable Bayesian posterior 
model via ABC inference. This last aspect will involve a highly non-standard combination of an adaptive 
MCMC matrix variate Metropolis proposal with the conjugate "symmetric" a-stable matrix variate pos- 
terior models to obtain an efficient proposal mechanism within the ABC context. The ABC methodology 
will also be extended by the development of a mixed model in which aspects of the observation vector 
can be evaluated explicitly combined with the a-stable random matrix observation components captured 
by the ABC approximation. 

The multivariate a-stablc model is fitted to intra-day price level shifts over a range of currency pairs, 
each for 30 contract segments dating back to 1999 on minute level price data. This provides us with 
statistical modeling of the inter-day left shifts via generalized a-stable models for each asset pair. We 
then take the parameter estimates for the a-stable model and study the impa ct of naively applying 
the standard Johansen procedure and the Bayesian model of iPeters et al. [2010a| to a price series with 
intra-day level shifts generated from one of the more extreme currency pair a-stable fits. This study 
is performed for one hundred independently generated data sets and the impact on the frcqucntist and 
Bayesian point estimators is studied. A significant impact due to the price series level shifts on the 
parameter estimation is observed when fitting CVAR models ignoring the price level shifts in each series. 
We then develop our mixture model for the noise process in the CVAR setting and we introduce two novel 
adaptive MCMC algorithms to work with both the simplified symmetric multivariate a-stable model and 
also the more general skewed multivariate a-stable models. Finally we conclude with a detailed data 
analysis both on synthetic and actual data series for pairs. 

Notation We denote a Gaussian random (n x T) matrix by Y ~ N n .T(l^, S, A) with row dependence 
in (n x n) covariance matrix £ and column dependence in (T x T) matrix A. Additionally we denote the 
vectorization of a random matrix to a random vector by Vec(Y) which will produce an (nT x 1) random 
vector in which the columns are successively stacked. Furthermore we denote the kronecker product or 
tensor product between two matrices by (8>. 
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3 Gaussian CVAR Model under ECM Framework 



Working with the model presented in ISugita [2002 



we denote the vector observation at time t by Xt- 



Furthermore, we assume x t is an integrated of order 1, 1(1), (n x l)-dimcnsional vector with r linear 
cointegrating relationships. The error vector at time t, e t are assumed time independent and zero mean 
multivariate Gaussian distributed, with covariance E. The Error Correction Model (ECM) representation 
is given by, 



p-i 



Ax t = n + a/3'xt-i + ^ ^Ax^, + e t 



(3.1) 



where t = p,p+ 1, . . . ,T and p is the number of lags. Furthermore, the matrix dimensions are: \x and e t 
are (n x 1), Vtj and E are (n x n), a and f3 are (n x r). We can now re-express the model in equation 
(|3.ip in a multivariate regression format, as follows 



where, 



Y 



x 



Y = 


XT + 


Ax p Ax p+1 






e P+i 


... 


Ax[ 


Ax' p ... 


Ax' 2 



Ax T 



\ 1 Ax' T 



Axt 



,W 



z = 
,r 



3 + E, 



(3.2) 



X Z/3 



,B 



Xt-1 

T' a 



T-p+ 



1 / 



Here, we let t be the number of rows of Y, hence t = T — p + 1, producing X with dimension 
t x (1 + n(p — 1)), r with dimension ((1 + n(p — 1)) x n), W with dimension t x k and B with dimension 
(k x n), where k = 1 + n(p — 1) + r. The parameters p represents the trend coefficients, and is the i th 
matrix of autoregressive coefficients and the long run multiplier matrix is given by II = a/3'. 

The long run multiplier matrix is an important quantity of this model, its properties include: if II is 
a zero matrix, x t contains n unit roots; if IT has full rank, univariate series in x t are (trend-)stationary; 
and co-integration occurs when II is of rank r < n. The matrix (3 contains the co-integration vectors, 
reflecting the stationary long run relationships between the univariate series within x t and the a matrix 
contains the adj ustment parameters, spe cifying the speed of adjustment to equilibria f3'x t . 



According to 



Gupta and Nagar [19991 ] [Theorem 2.2.1] we see that if we have a random matrix variate 



Gaussian Y' ~ N n ^r(M, E, "P) with row dependence captured in E and column dependence captured in 
^, then the vectorized form, in which the columns are stacked on top of each other to make a nT x 1 
random vector, is multivariate Gaussian Vec(Y) ~ N n T(Vec(M), E (g> 1 i<). This allows us to represent the 
matrix variate likelihood for this regression, for the model parameters of interest B, E and f3, by 

-0.5nt|v> o T 1-0.5 . 

\ —U.OVW\I — VV U ) V5I J-} ) V CC\ 1 — VV U j j 

(3.3) 



L(B, E, (3; Y) = (27r)- a5nt |E eg) 7 t |- a5 cxp (~0.Wec{Y - WB)'^' 1 eg) I^)Vec{Y - WB)) 



oc |E|-°- 5t exp ^-0.5tr[E- 1 (S' + R)\ 
where E = Cov{e) and R = (B - B)'W'W(B - B), S = (Y - WB)'(Y - WB), B = {W'W^W'Y. 
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4 a-stable Models for Inter-day Differenced Price Shifts 



Noise modeling via a-stable distributions has been suggested in s everal areas, such as wireless commu 



nica tions and in financia l data analysis, see lFama and Roll [1968j . iGodsill [2000| . iNeslehova et al. [2006 
and IPeters et al. [2010a| . a-stable distributions possess several u seful properties, including infinite mean 



and infinite variance, skewness and heav y tails 



We consider the SO parameterization, see 



Zolotarev [19861 and 



Samorodnitskv and Taqqu [1994 1. 



Peters et al. [2010al | for details. Considered as generalizations 



of the Gaussian distribution, they are defined as the class of location-scale distributions which are closed 
under convolutions. Considering this class of noise process for inter-day price shifts allows us to include 
as a special sub-case the standard CVAR models which are assumed to have purely Gaussian innovations. 

Hence, we extend the CVAR model to incorporate a composite mixture of noise processes, with 
e t ~ Af(0, E) for intra-day samples and e t ~ S a (f3, / y, S) for inter-day observations. In this notation, the 
i-th asset has stable iter-day error model ~ S (<)(/8^,7^,<^)- Therefore, the resulting multivariate 



model we consider for innovation errors e* at time t is given by dependent elements e 



'0 



N (0, I (t i t) + 5 aM (j8W,7 (<) , * (<) ) I (t e r) 



(4.1) 



where S a (/?, 7, S) denotes the a-stable distribution and r represents a vector of each of the first instants 
in time that both assets can be traded on their respective markets on each given day for the data series. 

The univariate a-stable distribution is typically specified by four parameters: a S (0, 2] determining 
the rate of tail decay; j3 <G [—1,1] determining the degree and sign of asymmetry (skewness); 7 > the scale 
(under some parameterizations); and S S R the location. The parameter a is termed the characteristic 
exponent, with small and large a implying heavy and light tails respectively. Gaussian (a = 2, = 0) and 
Cauchy (a. = l,/3 = 0) distributions provide the only analytically tractable sub- members of this family. 
In general, as a-stable models admit no closed form expression for the density which can be evaluated 
point-wise (excepting Gaus sian and Cauchy mem bers) , inference typically proceeds via the characteristic 
function, see discussions in lPeters et al. [2010a | . Though, intr actable to evaluate point-wise, simulation 

Chambers et al. [1976l |. This observation is crucial to the ABC 



of random variates is very efficient, see 
based approach we develop. 

We can estimate the a-stable model parameters for the day boundary level shifts in our model in sev- 

McCulloch [198fij . 



cral ways, for example a quantile based genera lized method of moment type procedure of 
or a maximum likelihood based approach of lNolan [1997l |. In addition Nolan has made available com 
mercial and academic software for fitting univariate stable models, see his URL at 



http://academic2.american.edu/~jpn olan/ stable.html and the corresponding papers of Nolan in [Section 
VII] of Alder et al. [1998j | for details of the implementation. 

The advantage of modeling the inter-day level shifts between the open and close of a market (ignoring 
weekends and end of segment - roll over effects) is that a statistical model of the historical behavior of 
these shifts, allows us to incorporate these inter-day shifts into the CVAR model which will improve the 
estimation of the parameters. This framework allows one to consider updating the statistical a-stablc fits 
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sequentially over time based on the entire price series or a rolling time window based on contract lengths. 



4.1 a-stable Empirical Assessment 

In this subsection, we first fit univariate a-stable models to historical price series data to assess if there is 
evidence for modeling inter-day level shifts via an a-stable distribution in the differenced price series. If 
the series indicates substantial deviation away from the standard CVAR model assumption of a Gaussian 
model (a = 2, ft = 0), then a composite mixture model for the errors proposed in Eq uation 15 . 1 1 becomes 
tenable. Otherwise, since the Gaussian distribution is also contained in the stable family, th e model we 



Peters et al. [2010a . 



propose reduces to the standard CVAR cointegration Bayesian model in 

Since we are analyzing the inter-day price shifts, the analysis is performed by first extracting 'daily' 
close/open differenced price series for each asset pairs inter-day price shifts. Daily here refers to the times 
when both markets for the pairs are first jointly open, or when the first market closes. Data consists 
of 10 minute level price data. The assets considered are AUD as Australian Dollars, CD as Canadian 
Dollars, FV as a US five year note, NQ as the NASDAQ mini and TU as a US two year note. In total 
each asset pair considers 30 contract segments, with varying numbers of days present and consecutive 
segment periods in time (a segment ends when a contract rolls over for one of the assets). Figure 1 shows 
each assets differenced price series Ax t = Xt — Xt-i from open of market each day to close of market each 
day, including the associated level shifts at the close/open day boundaries, for the 30 contract segments 
in the base currency units. We then extract these day interval differenced level shifts elements and fit 
them independently for each asset with a a-stable model. The parameter estimation results for the a- 
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Figure 1: Plots of differenced price series for the 30 contract segments. 



stable model comprised of level shift data for inter-day boundaries, in the 30 segments of each asset, are 
provided in Table [TJ The result s are reporte d for the SO parameterization for estimates obtained via 
Maximum likelihood procedure of lNolan [1997j . The approach we propose here is flexible and can involve 
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Asset i.d. 


# days 


Period 


a 


P 


7 


6 


AUD 


1535 


05/09/99- 


30/11/05 


1.833 (0.07) 


0.019 (0.34) 


195.365 (9) 


5.1510 (17) 


CD 


1535 


05/09/99- 


30/11/05 


1.666 (0.08) 


0.028 (0.20) 


97.344 (5) 


-4.699 (8) 


FV 


960 


05/09/99- 


18/08/03 


1.855 (0.08) 


-0.551 (0.42) 


105.134 (6) 


15.922 (11) 


NQ 


1054 


05/09/99- 


02/12/03 


1.254 (0.09) 


0.009 (0.14) 


313.678 (23) 


1.673 (31) 


TU 


960 


05/09/99- 


18/08/03 


1.807 (0.09) 


-0.059(0.37) 


88.119 (5) 


-0.088 (10) 



Table 1: Estimated Maximum Likelihood parameter estimates and in brackets the half interval 95% 
confidence intervals for the estimated parameters. 

fitting the stable model to any sub segment of data required, with different stable parameter estimates 
per data segment. We assessed the stable fits over time by successively adding blocks of 100 days price 
shifts to the series and refitting the a-stable distribution. This produces an assessment of the stability 
of the fitted distributions over time, we found parameter estimates to be fairly constant over the time 
periods considered in Table [TJ 

The result of this analysis suggests that it is clearly suitable to consider modeling the inter-day level 
shifts as distinct from a Gaussian innovations. The analysis shows that for each of the assets, the a-stable 
shape parameter has 95% confidence intervals which do not contain the Gaussian case a = 2, even with 
large historical data sets. Furcthcrmore, in the case of the Canadian dollar and the Nasdaq mini index, 
the value of a obtained implies a signifcantly heavy tail model is appropriate. Additionally, several series 
demonstrate asymmetry, violating the assumptions of Gaussianity at these inter-day bound ary points and 



Chen and Hsiao [2010l | can be invalid 



also demonstrating that the symmetric simplification proposed in 
in many real data settings. 

4.2 Influence of Non-Gaussian Level Shifts on CVAR Estimation 

In this section we study the impact on parameter estimation for the CVAR model when failing to appro- 
priately model the inter-day level shifts. To achieve this we consider synthetic data generated from the 
p air series, (d = 2) CV AR model in Section[3]with rank r = 1, lag p = 1, identification constraint specified 



in 



Peters et al. [2010a| and parameters specified as: j3 = [1, -1]; a 1 = [-0.002, 0.001]; £ = 100 x I 2X 2 and 
M= [0,0]. 

To assess the impact we generate two different groups of data series. The first consists of 100 in- 
dependently generated data time series realizations of length T = 200 using the above specified CVAR 
model parameters, with Gaussian innovation errors at all times, the standard CVAR model. The second 
consists of 100 independently generated data time series realizations, T = 200. The difference is that the 
noise model is now given for each asset by 

ejP ~ A/" (0, 10) I (tt t) + Si.e (0, 97, -4.7) I((6t), (4.2) 

where a-stable parameters are based on those estimated historically for the Canadian Dollar inter-day 
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level shifts, see Tabled] For the sake of comparison, the same Gaussian innovations are used in the rest 
of the time series other than those falling on a time period in which a-stable innovation is generated. 

In Figure 2 we show example comparisons of the raw price series for the model with pure Gaussian 
innovations (dashed line) versus the equivalently generated a-stable mixture generated price series (solid 
line). In this synthetic example, we take r — {t ; s.t. mod(t, 20) = 0, Vt G 1, . . . , T} which is equivalent 
to taking every 20-th noise sample from the a-stable model fitted to this asset on historical data. 




Figure 2: Plots of pair raw price series for data set 1 (dashed line) standard Gaussian noise CVAR model; 
(solid line) Stable + Gaussian noise CVAR model. 

For each of the synthetically generated groups of 100 data sets we estimate the parameters of the 
CV AR model. We compare a Ma ximum Likelihood based procedure, known as the Johanscn procedure, 



see iJohansen and Juselius [1990i | , to a Bayesian estimation. The Bayesian CVAR model we consider 
utilized vague priors for all parameters, so that the likelihood would drive the parameter estimation. 
The posterior sampling for /3 parameters was p erformed via an adaptive MCMC algorithm to estimate 
the MMSE, as specified in lPeters et al. [2010aj . That is we estimated under assumed knowledge of the 
cointcgration rank r = 1, the nine parameters corresponding to E, c*,/3 and fi. Both of these models 
estimation procedures do not account for the a-stable noise impurity introduced, hence we can assess the 
impact of such noise on the parameter estimates. 

Figure 3 displays the histogram of the estimated cointcgration vectors free parameter Px,2 after a 
normalization and identification constraint, under both the Johansen procedure MLE and the Bayesian 
minimum mean square error (MMSE) estimate, for each data set in each group. The true parameter value 
used in the model to generate the data was 0i2 XJE = — 1. The dashed line in the figures represents the av- 
erage MMSE estimate for fti^ over the 100 data sets. The results in top sub-figures compare the parameter 
estimates for the CVAR model generated with a Gaussian innovation noise (LEFT - Bayesian Estimates; 
RIGHT - Johanscn Estimates). The Johansen procedure produced several estimates which were poor 
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which effected the mean parameter estimate, see discussion on this point in 



Johansen and Juselius [199 1 




Figure 3: Plots of estimated parameter /3i t 2 for 100 data sets with and without a-stable inter-day noise. 
TOP LEFT: standard Gaussian noise CVAR model estimated MMSE. TOP RIGHT standard Gaussian 
noise CVAR model Johansen MLE. BOTTOM LEFT: a-stable + Gaussian noise model estimated MMSE 
(ignoring stable noise presence). BOTTOM RIGHT: a-stable + Gaussian noise model Johansen MLE 
(ignoring stable noise presence). 

The results demonstrate that both the Johansen and the Bayesian estimates for the cointcgration 
vector /3 in each of the 100 data sets are severely affected by the presence of the inter-day jumps, 
modeled here by the a-stable noise. Therefore to avoid bias in the parameter estimates obtained in the 
CVAR model, one must appropriately model the inter-day level shifts in the price series. 



5 ABC Bayesian CVAR Models 

Here we extend the class of Bayesian CVAR models presented in Peters et al. [2010a and Sugita [2OO9I ] 
to include the composite a-stable noise model developed. This will allow us to then formulate a Bayesian 
estimation procedure for the parameters in this model. In doing so we are able to estimate the parameters 
of the CVAR model with out the bias introduced by inappropriate model assumptions as assessed in Sec- 
tion l4.2l Note that due to the fact that the general a-stablc model does not admit a tractable density, this 
directly impacts on the ability to apply the standard Johansen procedure, as the likelihood can no longe r 



be evaluated point-wise. Alternatives in such cases include indire ct inference, see 
This would generalize the symmetric simplification proposed in 



Gourieroux et al. [1993 1. 
Chen and Hsiao [20ic| . 



Instead we formulate a novel ABC or approximate Bayesian computation (ABC) solution. ABC 
Bayesian modeling is a new class of statistical techniques specifically designed for modeling when the 
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likelihood and thus the poster ior distribution is i ntrac table. These have now been studied and ap- 

Peters et al. [2010b| and 



plied in a range of settings, see 



)6] 



Peters and Sisson [2006] for ABC modeling for 



financial risk and insurance contexts. In addition, there are now several methodological papers and 
reviews available for this new class of modeling techn ique, see iPeters et al. [2008 ^ 



Fearnhead and Prangle [201 



0], 



Beaumont et al. [2009] and the review of 



Tavare et al. [19971 . 



Sisson and Fan [201C [ 



We make identical model assumptions and restrictions for the Bayesian CVAR model as in 



Peters et al. [2010a |. 



In particular, for any non-sin gular matrix A, th e matrix of long run multipliers IT = a(3' is indistinguish- 
able from IT = aAA~ 1 /3', see Koop et al. [200(1 . We remove this problem by incorporating a non unique 
identification constr aint by imposing r 2 restrictions as follows f3 = [I r ,/3i]', where I r denotes the 



identity matrix, see 



00*] ]. Wc first specify our prior structure and then sepa- 



r x r 



rate the problem into two sub cases, the symmetric a-stable case and the general skewed a-stable model. 
We present the Bayesian model for estimation of (3, B and E conditional on the rank r under each of 
these settings. 



5.1 Prior 

The prior model is identical to the choice of IPeters et al. [2010al | and ISueita [2002| , which produces 
conjugate posterior distributions for matrix variate parameters E and B. In the new composite noise 
model we develop we must re-derive the Bayesian models in the presence of the a-stable inter-day noise. 
In general conjugacy is lost for the general asymmetric noise models in Equation l5.il However, we derive 
a novel conjugacy under transformation in the symmetric case via a scaled mixture of Normals (SMiN) 
representation of the a-stable inter-day model. 

• (3' ~ N(j3' , Q ® H ~ 1 ) where N(f3, Q ® -ff _1 ) is the matrix-variate Gaussian distribution with prior 
mean /3, Q is a (r x r) positive definite matrix, H a (n x n) matrix. 

• E ~ IW(S, h) where IW(S, h) is the Inverse Wishart distribution with h degrees of freedom and S 
is an (n x n) positive definite matrix. 

• _B'|E ~ N(P', E ® A -1 ) where N(P, E <g) A -1 ) is the matrix-variate Gaussian distribution with h 
degrees of freedom and S is an (n x n) positive definite matrix. 



5.2 Derivation of a Conjugate Matrix-variate SMiN Bayesian CVAR Model 

In this section a novel matrix variate Bayesian conjugate model is derived for the mixture of noise pro- 
cesses in the ECM framework. Lemma 1 and Lemma 2 combined with Theorem 1 demonstrate that 
under a specifically designed transformation of the vectorized matrix of observations, we can obtain a 
joint likelihood for the a-stable and Guassian innovations mixture model in the un-vectorized matrix 
variate observations which is matrix-variate Gaussian with explicit covariance matrix under the transfor- 
mation. This will be critical as we wish to obtain a Bayesian conjugate model for the posterior matrix 
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parameters. In addition to the covariance structure, Lemma 3 and Theorem 2 then derive the form of the 
mean matrix for this matrix variate likelihood, via a well known tensor product identity on vectorized 
transformed data. To achieve this we consider a special form of non-negative tensor factorization of our 
transformation matrix. In addition we prove that the solution to the mean struture parameter matrix in 
the transformed model, can be uniquely recovered under the transformation developed, given estimates 
of the transformed parameters. Therefore we can sample the transformed posterior distribution and 
then invert posterior samples via the transformation to obtain un-transformed samples uniquely. Finally, 
Theorem 3 derives the conjugate model for the matrix variate parameters of the posterior under the 
transformation. This is meaningful as it allows us to exploit existing results developed for matrix variate 
distributions in the CVAR ECM framework. The other important resuls is that this allows us to reduce 
the posterior dimension significantly, as we do not need to parameterize the posterior covariance matrix 
for the vectorized observations which would be dimenson nT x nT, instead allowing us to work with an 
n x n posterior matrix. Clearly, a significant dimension reduction, especially in the setting of financial 
data, where the number of data points T >> n is of the order of 100's to 1,000's. 

When the noise model in Eauation l5.1l is strictly sy mmetric, ie. th e a-stable inter-day noise model is 



Godsill [2000l | . This involves models of the form 



symmetric, it admits an exact SMiN representation, see 

ef - M (o, I(t£r) +]\f (d« , 7 «A W ) I (t G t) , (5.1) 

with auxiliary scale variables distributed as \W ~ <S a(i) , 2 (0, 1, 1). 

For simplicity we assume a lag p = 1, this can be extended trivially under our framework. We will 
first take all the vector observations for times 1 to T (mixed sets of (inter-day) SMiN and standard 
(intra-day) CVAR Gaussian innovation noise random vector observations), denoted yx/r with dimension 
T x n, which will have a log-likelihood model given by 

£(E,B,/3, X,a,-f,S;y 1:T ) 

= log ((27r)-°' W |E <g> I { \-°- 5 exp [-O.Wec{Y - WB)'^- 1 g) I^)Vec(Y - WB)X\ I (t <£ t) 

+ log ((27r)-°- 6 "( T -^ \D X g I {T _- t) I" ' 5 exp (-0.5Vec( X - W {T _~ t) B)' ((D,)- 1 8 J" 1 ^) Vec(( x - W (T _ i} B)))) I(i G t) 

(5.2) 

where Y represents the matrix of observation differenced price vectors corresponding to intra-day prices 
with a total of t rows and W is the corresponding matrix for Y. In addition \ = ^-y ~ lfc<^ T corre- 
sponds to the inter-day observation matrix of observation differenced price vectors not including rows 
for Y after subtracting of the location parameters for each a-stable fit, given by S = [5^, . . . ,5^]'. 
The definition of Wr T _^\ is the matrix for W corresponding to the observation vectors taken from the 



A 1 7 1 , . . . , A n 7™ 



are the scale parameters in the 



set of intra-day times when t £ r. The vectors A 
SMiN representation and D\ is a diagonal matrix with each value of A in the diagonal. This vec- 
torized representation is instructive to understand the model, however to exploit conjugacy present it 
will be beneficial to re- represent the likelihood in a matrix variate decomposed form specified in Lemma 1. 

Lemma 1 . Utilizing the assumption of conditional independence of the observation vectors given 
model parameter matrices and vectors E, B, (3, A, a, 7, S which states E [y s ,yt] = E [y s ] E [y t ] Vs, ts^t 
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and additionally the results from I Gupta an 



d Nasar [1994] ([Theore m 2.2.1], [Theorem 2.3.11]) and the 



Gupta and Naear [1999] [Theorem 1.2.21 (v and x)] we can 



trace identity and determinant identities of 
specify the complete grouped vectorized likelihood. That is we consider a reordered version ofY = y\-T 
denoted generically by Vec(Y*) ~ N n x (Vec(M^), £* ® which gives the grouped likelihood model: 

L(S,B,f3,X,a, 7 ,S;Y,) 

= (27rr°' 5nT |E, ® *,r ' 5 exp (-0.5Vec(F» - M*)'(£* 1 ® ^~ 1 )Vec(Y, - M«)) (5.3) 
oc |E»r°- 5T |*»r°' 5n exp(-0.5tr{E; 1 (y» - D, — W / *B)'*^ 1 (Y» -» 
where we have ordered the observation vectors 

y* = yi-.T = [yiV2 ■■■ y Tl -i y Tl +i ■ ■ yr y Tl y T2 ■■■ y Tl J 

and there are a total of id inter-day boundaries in the series. In addition we dehne the appropriate 
likelihood matrices as follows for a general covariance matrix structure £* <8> \&* (for £» a n x n matrix 
and 'J* a T x T matrix), 

( \ I W 

\ l tD 8 T ) \ W [T _~ t) 

We now present some remarks about grouping all observations from intra-day and inter-day into a single 
matrix-variatc Gaussian likelihood model. 



Remark 2: Lemma 1 states that generically the observations can be reordered to form the (n x T) 
Gaussian random matrix Y* with the Erst t columns corresponding to the intra-day price differences 
and the remaining T — t columns from the SMiN observations. In addition we can represent the matrix 
variatc Gaussian as having a covariance structure given generically by X* <g) where £» corresponds 
to the row dependence and VP* captures the column dependence. Lemma 1 also presented the required 
mean structure for this combined matrix-variate likelihood. 



Remark 3: To relate the matrix variate Gaussian model, obtained from Lemma 1, to the original 
likelihood model in 15.21 we need to find a relationship to identify the sufficient statistics matrices, £* 
and 'I', with the original likelihood model. Under this reordered and repacked matrix variate Gaussian, 
the independent columns of the random matrix is no longer true, that is "J* is only diagonal when D\ = £. 



Remark 4: Maintaining the conjugacy structures developed in 



Peters et al. [2010a] [Section 3] and 



Susita [2002] . for the standard matrix variate Gaussian noise Bayesian CVAR model is beneficial for 
inference and sampling. This would require us to identify the sufficient statistics, (AT*, £*, 'J*), for the 
grouped matrix variate Gaussian model in Lemma 1, and to have £* = £ and ^P* diagonal, as this 
will preserve conjugacy results, conditional on parameters from the Gtted a-stable SMiN intra-day noise 
model. This would allow us to specify a matrix variatc prior only on a matrix £* which is n x n rather 
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than a multivariate covariance which is nT x nT thus providing a significant dimension reduction in our 
posterior model parameters to be estimated. 



Lemmas 2, 3 and Theorems 1, 2 and 3 allow us to identify the sufficient statistics and then transform 
the vectorized random observation matrix to recover the desired conjugacy properties discussed in 
Remarks 2, 3 and 4. 



Lemma 2. Using [Definition 2.2.1] and [Theorem 2.2.1] of \Guvta and Nasar [199^] . the random 
vector Vec(Y t ) is conditionally a multivariate Gaussian random vector of dimension nT x 1 . Using Lemma 
1 and the SMiN CVAR model assumption of conditional independence, but not identically distributed, 
Gaussian observation random vectors we can explicitly identify the mean and covariance structure of the 
vectorized observation matrix Vec(Y*) in terms of the original CVAR model matrices as follows, 



Cov{Vec(Y*)) = E* ® *, 



E® It 





In addition we can obtain the covariance ofVec(Y[) as 




Cov{Vec{Yi)) = *, 



It <g> E 







l (T-i) 



Having identified the covariance structure for the vectorized reordered observation matrix, we present 
Theorem 1 to address Remark 4 which pertains to maintaining a likelihood structure that will admit 
conjugacy under the priors presented in Section [5. II 

Theorem 1. Given Lemma 1 and Lemma 2 which provide us with a (nT x 1) random vector Vec(Y*) 
conditionally distributed according to a multivariate Gaussian distribution, under a transformation by 
a nT x nT matrix we can obtain a transformed ran dom vector denoted Vec(Z*) = Q*Vec(Y*) which 
is also multivariate Gaussian. Using Luetkevohl (20oJ/ [Proposition B.2] we obtain, for 



Vec(F*) ~ N (Vec(M*), £* ® ^*), a transformed random vector 
Vec(Z«) = Q*Vec(Y*) - N{Q*Vec(M*), Qj(E* ® *«,)£*)• 
If we select the transformation 

In ® k 

Q(g> I {T _ t) 

then we can obtain a particular form for the n x T un-vectorized random matrix for Z* which has 
a covariance structure based on the original covariance for the Gaussian inter-day innovation noise E. 
That is we obtain Z* ~ N ni T(jJ*+, E, It). In addition we can define 

It ®In 

I (T _ i} ® i 

such that when it is used to transform Q*tVec(Y[) we obtain Q*tVec(Y^) ~ N u ^t{^*, It, E) and we also 
have that Z'^ = Q* t Vec(Yl). 
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Proof: To prove the covariance structure of the transformed random vector under this particular trans- 
formation has this special tensor product factorization we consider the new covariance structure for 
Vec(Z^) which will be given by 



Zov(Vec(Z*)) 




(T-i) 



We can therefore obtain Cov(Vec(Z*)) = T,<S>It by solving the equation Q T D\Q = £ for matrix Q. We 
can make use of the fact that the n x n matrix D\ is diagonal and the covariance matrix £ is real and 
symmetric with an eigen decomposition £ = VFV T with diagonal eigen values matrix F. Therefore if 



we select Q = S^U T where is the diagonal matrix with the elements Su = y j^'. then the matrix 
U is the orthonormal matrix of cigen vectors for S, that is U = V. The proof for the transformation Q vl 
of Vec(Yl) follows trivially from this result. □ 

Hence, we have transformed the observation vector Vec(Y*) via matrix Q„ to obtain a new random 
vector which when un-vectorized produces a matrix variate Gaussian with row dependence given by £ 
and column dependence given by It- This therefore recovers the conditional independence property of 
each vector observation whilst identifying under the transformation the identity £* = £ and "J"* = It- 
Therefore the matrix variate likelihood for transformed observations Z\-t is given by Lemma 3. 



Lemma 3. Under [Definition 2.2.1] and [Theorem 2.2.1] of \Guvta and Nasar [199^] . the likelihood 
of the transformed observations is given by 

L(E,B,f3,X,a,y,S,Q;z 1[T ) 

<x |E* ® / T r - 5 exp(-0.5(Vec(Z») - Q,Vec(D, - W,B))' (E" 1 ® I^ 1 ) (Vec(Z«) - Q,Vec(D, - W,B))) 

Then applying the identity in [Theorem 1.2.22] of Gupta and Nasar (199sj/ . given by 



{B 1 ® A) Vec(X) = Vec(AXB), (5.4) 

we can rearrange the mean structure of the likelihood model. We can make an arbitrary choice of 
factorization of Q* into the form Q* = G <g> H with the only constraints that G is (p x n) and that H is 
(q x T) dimensions, with pq — nT. There are several solutions to this class of tensor factorization, we 
will present our factorization in Lemma 5. Hence, we rearrange the mean structure in the likelihood as, 

L(E, B, (3, A, a, 7, S, Q; z 1:T ) 

oc |E» <g> I T \~ - 5 exp (-0.5 (Vec(Z*) - Q,Vec(D t - W*B))' (E7 1 ® I? 1 ) (Vec(Z») - Q«Vec(D, - W*B)j) 
<x |E» <8> ir|" ' 5 exp (-0.5 (Vec(Z„) - Vec{D t - W*B)^j (E^ 1 ® I^ 1 ) (Vec(D t - W*B))) 

(5-5) 
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defining = HD*G T ', W* = HW* and B = BG T . This then allows us to re-express the likelihood 
model in the form 



p(zi: T \T,,B,P,X,OL,i,S,Q) 



|E»r 5T |/ T r - 5n exp (--tr^Z^Z* - D* - W*B)'{Z, - D* - W«B)} 
|E*r°' 5T exp f -itr (I, + (B — %,)'W',W,(B - I,)) | 



(5.6) 



with B» = {w'JV^j 1 Wl(Z.-D„) and I* = (z» - £>* - (z» - -D* - . If lags ofp > 1 

are of interest, this approach can still he used, but the block diagonal covariance matrix will involve more 
sub-blocks. 

We can now comment on the possible solutions to this tensor factorization. 

Remark 5: Typically the basic Singular Value Decomposition is applied to perform a tensor factoriza- 
tion - but this will be difficult in our setting as we are required to enforce the sub-matrix constraints that 
the Grst factored matrix must be (p x n) with n columns and the second q x T with T columns. Another 
solution would be to search over all subspaces for the p and q combinations to satisfy pq = nT for a set of 
matrices that minimizes a matrix norm. There is a rich literature on such tensor factorizations and the in- 
terested reader is referred to numerical algorithms for rank-k tensor approxi mations which generalize th e 
SVD such as the orthogonal tensor decompositions (Higher-Ord er SVD) of 



De Lathauwer and Vandewalle [2004] or 3- way decom positions ouHarshman [19701 k nown as PARAFAC 



and the Non-Negative Tensor Factorization (NTF) in 



Shashua and Hazan [20031 . 



Friedlandera and Hatzb [2008] . 



In Theorem 2 we provide a specific tensor factorization to satisfy Lemma 3. It is important to 
obtain a specific factorization which allows us to decompose the transformation matrix into a tensor 
factorization which admits at least one solution to the original mean structure for B. When multiple 
solutions are present we can handle this in our Baycsian framework through imposing constraints post 
sa mpling, as typically performed in these situations in which there are complications with identifiability, 



see 



Celeux et al. [200q . We can provide a unique solution for the original mean structure for B' given B' . 



Theorem 2. Given transformed observations, Z' t , we obtain an analytic tensor factorization for 
the transformation matrix Q* t satisfying the dimensionality constraints on the tensor factors in Lemma 
3, given by 

T T 

Q*t = ^ ^ Uij ® Qi,j 

i=l j=l 

where Qij represents the (i, j)-th sub-block of dimension nx n in the nT x nT transform matrix Q*t and 
Uij represents the (T x T) matrix whose ij-th element is 1 and whose remaining elements are 0. This 
particular choice of factorization ensures that a unique solution to B' is attainable given B. This will be 
particularly important for the conjugate Baycsian model in Theorem 3. The mean structure under the 
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transformation is given by, 



T 

E 

i=l 



E[Vec{Zi)} = Q* t Vec(Di - B'W'J 

Vec(Q u (D', - B'WM) 

This allows us to make explicit the mean structure of the matrix variate transformed data likelihood 
U 

Yh=1 QiiB'. 



of Lemma 3 by identifying the following elements = J2i=i QuD'*U'ii> = Y^i=i W'JJn and B' 



Proof: Using the identity [(1-29) p. 343] of lHarville [20081 ] we can exploit the fact that the trans- 
formation matrix we have selected is a square nT x nT matrix which has a n x n block diagonal 
structure. Hence we will consider the following structure in Q*t 

T 



( 



Qn Q12 



QlT \ 



\ Qti Qt2 ■ ■ ■ Qtt J 

with each sub matrix Q^ being selected as (n x n) matrix. We can then obtain the following tensor 
factorization, using the fact that all Qij matrices will be comprised of elements other than those with 
i = j giving a sparse representation 

T T T 

^ Qij 



Q* 



EE^. 

i=l 3 = 1 



E^ 

i=i 



Qi 



As above, Uij represents the (T x T) matrix whose ij-th element is 1 and whose remaining elements are 
and we have used the fact that we have specifically selected the transformation matrix Q*t as n x n block 
diagonal. Under this factorization the mean structure we obtain in the likeliho od model in Theorem 1 
with application of the identity in [Theorem 1.2.22] of iGupta and Nagar [19991 ] shown in Equation 15.41 
is given by 

E \Q*tVec{Yl)] = Q*tVec(D'„ - B'Wl) 

T T 

= E E ( u n ® Q«) Ve < D * - B ' w *) 
i=i j=i 

T 

= Y, VeciQuDiUli - QuB'WiUi)) 



This allows us to make explicit the mean structure of the matrix variate transformed data likelihood by 
identifying the following elements D'* = J2l=i QiiD'JJ' H , Wl = £^=i W'JJ' H and B' = Y^=i QUB'. 
Finally, we note that we can uniquely solve the system 

T 



B' 



i=i 



17 



for B' given B'. This is due to the fact that the matrices Qu for i < T are constructed from identity 
matrices and the case of i = T is constructed in our transform as a real matrix of eigen vectors of 
covariance matrix E, which is therefore invertible. We can therefore obtain the unique solution for B' as 

B' = & ((T - 1)7„ + Qtt)- 1 . 

□ 

Under the transformed observation vector we utilize an identical prior model for the transformed mean 
structure as specified in Section [5. II to obtain conjugacy for the transformed prior-likelihood model. 

Theorem 3. Under Theorem 1, Theorem 2, Lemma 1, 2 and 3 and conditional on parameter 
estimates of the multivariate a-stable statistical model, S a (/3,7, S), fitted to historical price series inter- 
day level shifts for each asset in the CVAR model, the following posterior conjugacy properties are satished 
for the prior choices in Section I5.it 

Conditional 1: Conditional on the re-arranged un-transformcd subset of observation vectors from intra- day prices 
matrix Y we obtain an Inverse Wishart distribution for 

p(£|/3, A, a, 7 , 8, Y) oc |^|(*+>>)/2| E |-(t+fc+n+l)/2 cxp (-O.S^E" 1 ^)) ; 
where S Y is dchned to be given by 

S Y = S + S + {P-B)' [A- 1 + (WW)' 1 ] -1 (p-Sy 

Conditional 2: Under the SMIN model and conditional on the re-arranged transformed complete vector of ob- 
servations for intra and inter-days, Vec(Z^) = Q*Vec(Y*) we obtain a Matrix-variate Gaussian 
for 

p(B\/3, A, a, 7 , S, E, Z*,Q*) oc \A Z . |"/ 2 |E|- fe / 2 exp f-0.5tr (e" 1 ^ - B Z J A+{B - B z .) 



where A Z ,=A + W»'W* and B Zt = (A + W'JV* ( AP + W'^W^B 



Conditional 3: Under the SMIN model and conditional on the re-arranged transformed complete vector of ob- 
servations for intra and inter-days, Vec(Z se ) = Q*Vec(Y*) we obtain the marginal matrix-variate 
posterior for the cointcgration vectors, (5 given by 

p(J3\\, a,7, S, Z*,Q*) oc p(/3)\S z » \^ t+h+1 ^ 2 \Az, \~ n/2 - 

for 



S Z ,=S + S* + (P- B*)' A- 1 + (WiW*)- 1 
and A z , defined in Conditional 2. 



"P-B. 
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Conditional 4: Under the SMIN model we obtain a the marginal distribution for each random variable A 1 in the 
n x 1 random vector X given by 



p(\i\a,<y,5,x,B,Q*,f3) oc J| jV(ej;0, A i7 i) * S a . /2 (A*; 0,1,1) 



where for all t S r we define e\ = \i,t 



W {T _~ t) B 



The proof for the conjugacy for Conditional 1 and Conditional 2 arc provided in 



Sueita [2002] [Section 



2.2, Equations 10 and 11] as a direct consequence of Theorem 1 and Theorem 2 and the t ransformation 



developed and conjugate prior choices. The derivation of Conditiona l 3 also follow s from lSugita [2002 



[Section 2.2, Equation 14]. The proof for Conditional 4 is presented in lGodsill [2000] [Section 2 Equation 
4] . We will later demonstrate in Section [6] how this conjugacy can be beneficially utilized as a proposal 
distribution in a ABC general non-symmetric a-stable Bayesian CVAR model. In addition we will provide 
novel algorithms to sample from the resulting posterior distributions also in Section |B1 



5.3 General a-stable Approximate Bayesian Computation CVAR Model 

Under the noise model presented in Equation (|4.2p we have an intractable matrix- variate likelihood model 
since the asymmetric a-stable inter-day model does not admit a density. Hence, our noise model for the 
i-th scries at time t becomes, 



,(<> 



■ N (o, o-W) I (t £ r) + S aW (& (i) ,7 (i) , * (0 ) l{ter). 



(5.7) 



In this section we develop an ABC model and associated Markov chain Monte Carlo (MCMC-ABC) 
sampler to perform estimation in this general composite C VAR noise model se tting. MCMC-ABC sam- 
plers are actively stu died in the statistical literature since iTavare et al. [1997j , see a review chapter in 
bisson and Fan [201o| . 



ABC inference adopts the approach of augmenting the target posterior distribution from the in- 
tractable "True" model, denoted p(£,-B,/3|Y) oc p(Y|£, B, (3)p(E, B, /3), into an augmented target pos- 
terior distribution. The ABC posterior model approximation, denoted pabc B, f3\Y), is therefore 
defined by, 

Pabc (£, B, 3, Y S \Y) = p{Y\Y Sl S, B, f3)p(Y s \X, B, /3)p(E, B, (3) (5.8) 

where the auxiliary parameters "synthetic observation" matrix Y$ are a (simulated) dataset fromp(Yg|£, B, /3), 
on the same space as Y . The function p(Y\Ys, £, B, (3) is chosen to weight the posterior p(S, B, (3\Y) with 
high values in regions where Yg an d Y are similar. There are many choices for this function discussed 



and studied in 



but generally it is assumed to be constant with respect to parameters 



E, B, (3 at the point Y$ — Y, so that p(Y|Y, S, B, (3) = c, for some constant c > 0, with the result that 
the target posterior is recovered exactly at Y$ = Y . That is pabc (£ B,3,Y\Y) = p(£, B,(3\Y) 
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Given the augmented ABC posterior distribution pabc (£, B, f3, Ys\Y) generally inference involves 
the marginal posterior, 

PABC (X,B,f3\Y)Kp(Z,B,p) f p(Y\Y s ,V,B,p)p(Ys\X,B,f3)dYs (5.9) 

obtained by integrating out the auxiliary dataset. The ABC distribution pabc (S, B, f3\Y) then acts as 
an approximation to p (£, B, (3\Y) and is obtained in practice by discarding realizations of the auxiliary 
dataset from the output of any sampler targeting the joint posterior pabc (S, B,/3, Ys\Y). 

Generally, the weighting function p(Y \Yg, S, B, /3) is simplified in two important ways, the first in- 
volves replacing the observation and synthetic data vector / matrix with summary statistics and the 
second involves making a kernel approximation to the weighting function. Therefore we obtain a kernel 
representation of the form 

Pe{Y\X,Y,,B,f3) = -A 
e 



see 



Peters et al. [2010b| . Irlatmann et al. [2009 | and iBeaumont et al. [2009| . In this simplification the 



data matrix Y is replaced with summary statistics (ideally sufficient statistics) vector or matrix denoted 
S(Y) of significantly lower dimension than Y . When sufficient st atistics are not available, then summary 



Fearnhead and Prangle [201 



01. 



Pe (Y\Y s ,Z,B,(3) 



statistics are utilized at the cost of bias, see recent discussion in 

We consider a hard decision kernel weighting function (uniform kernel) with Euclidean L2-norm 
distance measure between summary statistics on vectorized observation matrices Vec(Y) and Vec{Yg) 
given by 

' I if \\S(Vec(Y)) - S(Vec(Y s ))\\ < e 
otherwise 

Remark 6: For sufficient statistics and as e — > it has been proven that an MCMC-ABC sampler 
with this kernel, will obtain correlated sa mples from the station ary regime given by the target posterior 
distribution p{T,,B,(3\Y), see a review in \Sisson and Fan [201 w . 

Remark 7: The model we propose is highly non-standard in the ABC literature since it involves a 
combination of likelihood components some of which are tractable and others which are intractable. This 
opens the possibility of many alternative sampling approaches, for example we could compute the likelihood 
for the tractable portions of time and then approximate the likelihood for the portions of time in which 
the noise model produces an intractable likelihood. 

The particular algorithm we consider in Section [6] will demonstrate how to combine both the SMiN 
and ABC Bayesian CVAR models developed. In particular providing a general adaptive MCMC based 
sampling algorithm for matrix variate a-stable CVAR posterior distributions in the approximate Bayesian 
computation setting. This involves use of the conjugate models derived under the SMiN assumption as 
proposal distributions in the ABC sampler, reducing the required dimension of the adaptive proposal 
kernel in our MCMC sampler. 

Hence we have developed two novel Bayesian modeling frameworks for incorporation of the a-stable 
model in the CVAR model framework. We can now consider inference and sampling under these models. 
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6 Sampling and Estimation 

Here we focus on obtaining samples from the matrix variate posterior distributions derived in Section [ 
We will achieve this via design of a novel sampling methodology we develop based on adaptive MCMC 
in a ABC setting. It is a hybrid algorithm since the proposal distribution for several of the posterior 
matrix variables (S,_B) in the ABC sampling framework are sampled via the conjugate model derived for 
the symmetric a-stable case in Theorem 3, which in this case acts as a proposal for the non-symmetric 
model in the ABC framework. The remaining matrix posterior parameters (/3,A) are sampled via an 
adaptive Metropolis and adaptive Rejection Sampling framework. The proposal are combined into the 
ABC methodology as presented in Algorithm 1. 

6.1 Hybrid Adaptive Markov Chain Monte Carlo ABC. 

Here we present the sampling methodology for posterior pabc^, B, f3\Y) conditional on estimation of 
the a-stable inter-day parameters on the batch of data Y under consideration. Note below we present a 
version of the HAdMCMC-ABC algorithm in which all matrix parameters are updated at each iteration 
of the Markov chain, however, block Metropolis-within-Gibbs frameworks are trivial to also consider. 
The resulting proposal distribution for the MCMC sampler comprises a hybrid proposal comprised of a 
conjugate posterior proposal under the symmetric a-stable setting and an adaptive Metropolis proposal. 
Proposing to update the matrix variate Markov chain parameters from iteration j — 1 to iteration j 
involves sampling proposal {£,£?, /3, A} given Markov chain state {T>, B, (3, X} [j — 1] according to the 
proposal, 

= p(£|/3, A, a, 7 , S, Y)p(B\/3, A, a, 7, S, £, Z„, Q*) P (\ t |a, 7, 8, X, B, Q*,(3)q((3, f3[j - 1]) 

where the first three proposal distributions for the Markov chain are given by Theorem 3 under a sym- 
metric a-stable intra-day assumption allowing them to be sampled exactly and q((3,f3[j — 1]) is given by 



the adaptive Metropolis proposal developed in 



Peters et al. [2010a| [Algorithm 2] 



7 Results and Analysis 

In this section we perform three studies. The first part involves numerical analysis of the algorithms 
developed to sample from the matrix variate posterior distribution on data sets generated with known 
parameters. This is performed in two settings, the first under a mixture noise model (Equation 15. 1|) 
with very heavy tailed symmetric a-stable inter-day noise (a = 1.3). In this case according to Theorem 
3, we know the exact posterior full conditional distributions. Therefore, sampling results from this 
model arc compared for the resulting exact MCMC sampler, denoted "Mixture Exact" versus a ABC 
approximation sampler generated under the ABC approximate model sampled via Algorithm 1, denoted 
" Mixture ABC" . In addition, we ignore the stable innovations and run the adaptive MCMC sampler 
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Algorithm 1: Hybrid Adaptive Markov Chain Monte Carlo Approximate Bayesian Computation 
(HAdMCMC-ABC). 

Input: Initialized Markov chain matrix variate states (o) = (e (0) , B (0) , /3 (0) , A< 0) ). 
Output: Markov chain samples {0O')} j=1:J = {S^, B^\j3^} j=1:J ~ p ABC (£, B, f3\Y). 
begin 

la. Set ABC tolerance level e (note annealing of the tolerance can be utilized), 
lb. Evaluate summary statistic vector for observed price series vectors S{Vec(Y)). 
repeat 

2. Sample conjugate proposals for matrix parameters (£,£?): 

2a. Sample proposed matrix state E* via inversion from conjugate posterior 
p(Y,\f3 (j ~ 1 \X^~ 1 \a^- 1 \f,d,Y), [Theorem 1: Conditional 1]. 

2b. Evaluate transformation matrix Q* based on proposed state S* and obtain transformed 
observation matrix Z*, [Lemma 3]. 

2c. Sample proposed matrix state B* via inversion from 

p{B\j3^- l \X^- l \a.^- l \^,8,Y l * , Z^,Ql), [Theorem 1: Conditional 2]. 

3. Sample adaptive proposals for matrix parameters (f3,X): 

3a. Sample proposed vector A* with each component sampled from p(Xi\a, 7, 5, x-i B, Q*, /3), in 
[Theorem 1: Conditional 4] via single component adaptive rejection sampling proposed in 
Godsill (2000) [Section 3.1.1., p.2] 

3b. Sample proposed unconstrained elements of matrix /3 from adaptive metropolis proposal in 
Peters et al. (2010) [Algorithm 2, p. 12]. 

4. Generate synthetic data set Y$ given proposal (£, B,/3, A) and fitted intra-day model S a (/3,j, 8): 

4a. Evaluate summary statistic vector for synthetically generated price series vectors 
S(Vec(Y s )). 

4b. Calculate weighting function in Equation 15.31 

5. Calculate ABC - Metropolis Hastings Acceptance Probability according to the general 
specification in Sisson and Fan (2010) [Equation 1.3.2] for joint proposal 9 = (£, B, /3, A): 



A(ewr)= PABC , < ,,■ > (6-2) 

v ) PABc{e(^)\Y) q {e^) ^e*) v ; 

Accept 0^ = 9* via rejection using A, otherwise 0^ = 6^ _1 \ Set j = j+1. 



until j = J 
end 



22 



and posterior models of iPeters et al. [2010al ] and lSugita [20oi| . denoted ("Gaussian"), to further assess 



the bias in parameter estimates if intra-day level shifts arc not modeled explicitly. We arc particularly 
interested in the estimated cointegration basis vector B which directly affects portfolio weights in pairs 
trading settings. 

The second study considers asymmetric heavy tailed a-stable (a = 1.3, (3 = 0.5) inter-day noise. In 
this case we can only compare the ABC model and MCMC results from Algorithm 1 to the case in which 
intra-day level shifts are ignored in the " Gaussian" case and sampling occurs as in IPeters et al. [2010a | . 
In the third study we consider a real data set analysis via our general MCMC- ABC sampler in Algorithm 
1, for a pair of assets, observed in practice to have a cointegration relationship with rank r = 1, with 
a-stablc fits from Table [J for AUD - CD. 

In all studies we consider pairs data, with a cointegration rank of r = 1. We ran samplers with 
10,000 burn in samples and 20,000 actual samples. In studies one and two we perform analysis on 20 
independently generated pairs of price data sets, with each price series of length 500 samples and every 
50-th sample modeled with an a-stable innovation. In the real data analysis we take the series described 
in Section 4.1. 

7.1 Synthetic Data Analysis - Symmetric Case 

The model used for this synthetic study considers parameter settings j3 = [1, 0.5], a = [0.1, —0.3], £ = I2 
fj, = [0, 0] and (a = 1.3 , j3 = 0, 7 = 1, S = 0). The prior settings for the Bayesian model are those specified 



in IPeters et al. [2010aj . The ABC tolerance level used was e = 0.1. In Table 2 we present the results 
comparing the performance of the estimation of the parameters for the resulting Bayesian posterior model 
in Theorem 3. The results demonstrate that the effect of ignoring the inter-day level shifts when fitting 
the Bayesian model has a significant effect on the estimation of the cointegration vector (3. In addition, 
it is clear that in this symmetric case, the estimates obtained via the exact MCMC sampler and the 
ABC approximation are similar. However, as expected, the computational cost for the ABC approach 
is significantly higher than the non-ABC approach. Wc also see that estimation of the other parameters 
are also accurate. We summarize the results for the cointegration vector (3 of the estimated MMSE in 
Figure 4 under the Gaussian case ignoring the intra-day level shifts and the mixture model proposed in 
this paper. 

7.2 Synthetic Data Analysis - Asymmetric Case 

The model used for this synthetic study considers identical parameter settings and prior settings for the 
CVAR model as the previous study, with the asymmetric inter-day noise model with a-stable parameters 
(a = 1.3, (3 = 0.5, 7 = 1, S = 0). The ABC tolerance level used was e = 0.1. In the asymmetric case we 
must work with the ABC Bayesian model. In Table 3 we present the results comparing the performance 
of the estimation of the parameters for the resulting ABC Bayesian posterior versus the basic Gaussian 
conjugate Bayesian model. Estimation results in Tabic 3 demonstrate significantly more accurate results 
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Gaussian model 


Mixture Gaussian and a-stable intra-day model 


Parameter Estimates 


Gaussian 


Mixture ABC 


Mixture Exact 


Truth 


Ave. MMSE /3 M 


-0.02 (0.21) 


0.39 (0.27) 


0.42 (0.25) 


0.5 


Ave. Stdev. /3i, 2 


0.28 (0.08) 


0.31 (0.12) 


0.35 (0.09) 


_ 


Ave. MMSE tr (E) 


3.17 (2.03) 


2.61 (2.12) 


2.23 (1.91) 


2 


Ave. Stdev. tr (E) 


0.16 (0.12) 


0.21 (0.16) 


0.19 (0.21) 


_ 


Ave. MMSE m 


-0.03 (0.08) 


-0.01 (0.03) 


0.05 (0.01) 


_ 


Ave. Stdev. /ii 


0.06 (0.03) 


0.08 (0.02) 


0.07 (0.02) 


_ 


Ave. MMSE ^ 2 


4.0E-3 (0.01) 


7E-3 (0.03) 


6E-3 (0.01) 


0.1 


Ave. Stdev. [i 2 


0.05 (0.01) 


0.07 (0.02) 


0.09 (0.03) 


- 


Ave. MMSE ai,i 


-0.06 (0.02) 


0.05 (0.02) 


0.08 (0.04) 


0.1 


Ave. Stdev. 0:1,1 


0.02 (2E-3) 


0.03 (4E-3) 


0.05 (3E-3) 




Ave. MMSE 0:1,2 


3E-3 (0.02) 


-0.19 (0.01) 


-0.21 (0.02) 


-0.3 


Ave. Stdev. 0:1,2 


0.02 (0.01) 


0.02 (0.01) 


0.04 (0.02) 




Ave. Mean acceptance probability 


0.37 


0.21 


1 





Table 2: Sampler Analysis: Ave. MMSE or Stdev is averaged posterior mean or variances obtained 
from estimation of the posterior parameters from 20 independently generated data sets. In (•) are the 
standard error in estimates. In all simulations the initial Markov chain is started far away from the true 
parameter values. 





Gaussian model 


Mixture Gaussian and a-stable intra-day model 


Parameter Estimates 


Gaussian 


Mixture ABC 


Truth 


Ave. MMSE /3i, 2 


-0.01 (0.21) 


0.36 (0.32) 


0.5 


Ave. Stdev. p 1}2 


0.28 (0.08) 


0.41 (0.16) 




Ave. MMSE tr (E) 


2.92 (1.32) 


3.0 (1.49) 


2 


Ave. Stdev. tr (E) 


0.14 (0.07) 


0.21 (0.12) 




Ave. MMSE ^ 


-0.02 (0.07) 


-0.01 (0.09) 


0.1 


Ave. Stdev. /ii 


0.06 (0.02) 


0.10 (0.03) 




Ave. MMSE ^2 


-3.0E-3 (0.01) 


4E-3 (0.03) 


0.1 


Ave. Stdev. [i 2 


0.05 (0.01) 


0.09 (0.03) 




Ave. MMSE a M 


-0.06 (0.01) 


0.06 (0.03) 


0.1 


Ave. Stdev. ai,i 


0.01 (2E-3) 


0.03 (8E-3) 




Ave. MMSE ai, 2 


2E-3 (0.02) 


1E-3 (8E-3) 


-0.3 


Ave. Stdev. 0:1,2 


0.02 (0.01) 


0.03 (0.01) 




Ave. Mean acceptance probability 


0.42 


0.28 





Table 3: Sampler Analysis: Ave. MMSE or Stdev is averaged posterior mean or variances obtained 
from estimation of the posterior parameters from 20 independently generated data sets. In (•) are the 
standard error in estimates. In all simulations the initial Markov chain is started very far away from the 
true parameter values. 
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for the estimation of the cointegration vectors when inter-day noise modeling is incorporated. Again, 
we summarize the results for the cointegration vector (3 of the estimated MMSE in Figure 4 under the 
Gaussian case ignoring the inter-day level shifts and the mixture model proposed in this paper. 



Mixture Model —Symmetric a. - Stable S t 3 (0,1 ,0) Results Mixture Model - Asymmetric a - Stable S 1 3 (0.5,1 ,0) Results 




MMSE (3 1 2 MMSE P 1 2 MMSE P 1 2 MMSE P 1 



Figure 4: Estimated cointegration vector /3. 



7.3 Real Data Analysis 

In this section we particularly focus on the accuracy of estimation of the cointegration vectors (3. These 
are important to the design of algorithmic trading strategies since they are the basis for projection of 
the raw price series to obtain a stationary deviation series to consider trading analysis. In addition we 
provide estimation results for the reversion rate of the stochastic trends to stationari ty as denoted by the 



Peters et al. [2010a 



and 



mat rix a. We an alyze the performance of the basic "Gaussian" posterior model of 

Sugfta [2009^ in the presence of inter-day price series level shifts versus the estimation of the " Mixture 



ABC" model via Algorithm 1. 

The price series for AUD / CD with base currency in AUD sampled at lOmin intervals during the joint 
open market hours. Analysis is performed for the first contract in Table 1, starting from the 05/09/99, 
containing 60 days worth of market data, producing a time series of prices of length 29,621 samples. 
The raw price series are presented in Figure 5 with circles representing the joint open of each market 
(inter-day boundaries). The data was transformed by translation of each series by the median and scaled 
by the standard deviation. The analysis performed considers 30 batches of 2 days of data, giving on 
average 489 data samples per batch, and the posterior parameter estimates are averaged over samplers 
analysis of each data set and presented in Table 4. The results demonstrate that failing to account for 
the inter-day level shifts observed can significantly affect the estimation of the cointegration vectors and 
reversion rates as demonstrated in the comparison in Table 17.31 and in Figure 17.31 

8 Conclusions 

We studied the impact of price series level shifts on statistical estimation of matrix variate parameters 
in CVAR models utilized in algorithmic trading. In particular we first demonstrated the significant 
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AUD and CD (10 min sample) price series 





Figure 5: Price series for AUD and CD. Circles indicate inter-day market time boundaries. 





Gaussian model 


Mixture Gaussian and a-stable intra-day model 


Parameter Estimates 


Gaussian 


Mixture ABC 


Ave. MMSE /3i, 2 


-0.31 (0.25) 


0.18 (0.21) 


Ave. Var. 


0.20 (0.04) 


0.83 (0.08) 


Ave. MMSE ai,i 


-0.02 (1.36E-3) 


-0.01 (3.8E-3) 


Ave. Var. ai,i 


3.90E-5 (4.09E-6) 


5.3E-5 (2.5E-5) 


Ave. MMSE ai, 2 


1.24E-3(1.20E-3) 


-6.3E-4 (1.7E-3) 


Ave. Var. ai,a 


2.18E-5(3.07E-6) 


1.7E-5 (1.0E-3) 



Table 4: Sampler Analysis: In (•) are the standard error estimates obtained from 20 batches of 
MCMC samples each of length 1,000, average over each of the sets of 2 days of data. 



impact on estimation accuracy under both frequcntist and Bayesian estimation frameworks when failing 
to appropriately model observed level shifts in price series. 

Next we developed a composite noise model comprised of Gaussian and a-stable innovation noise for 
the CVAR model in the presence of price series level shifts. The example we illustrated this point on was 
the situation that occurs at deterministic times each trading day, at the inter-day market boundaries. 
However, we point out that our methodology is general and extends also to settings in which the level 
shift times arc unknown a priori. This would modify the problem to additional estimation of the r times, 
then conditional on these estimates, our methodology can be applied. 

Working under this composite noise model of Gaussian and a-stablc CVAR innovations, we developed 
a novel conjugate Bayesian model under transformation, allowing for exact MCMC sampling frameworks 
to be developed in the symmetric heavy tailed a-stable scenario. In the asymmetric skewed noise setting, a 
non-standard approximate Bayesian computation model was developed and an advanced adaptive MCMC 
algorithm was utilized to sample this ABC posterior. This incorporated the conjugate model developed 



2G 



AUD-CD pair - Gaussian Model 




2 day segement index 



Figure 6: Estimated cointegration vector f3 for AUD-CD pair for 2 day segments at lOmin samples. 
TOP: Gaussian model; Bottom: Mixture ABC model; Solid line is estimated MMSE and dashed line is 
posterior 95% C.I.. 

in the symmetric case as an MCMC-ABC proposal for the asymmetric setting. 

We were able to demonstrate and verify on synthetic data sets under both symmetric a-stable and 
asymetric a-stable models, that the sampling methodology we developed for estimation of the MMSE for 
the matrix variate posterior parameters is accurate. We then compared the performance of our model 
and sampler to the standard Gaussian Bayesian CVAR model on real financial pairs, demonstrating a 
marked difference in the estimated CVAR model parameters. Hence, justifying the applicability of such 
a model in applied financial models for trading. 

Our framework was motivated from the perspective that our approach is justified by the assumption 
that the underlying model for the price series pair is appropriately modeled by the basic CVAR mode l 

hen and Hsiao [201oj . 



presented in Section [3] This differs significantly to the underlying assumption of 



If this assumption is not suitable, alterna tive approache s could be considered, such as the use of a Markov 



switching regime model, see for example iKrolzig [1997 1. Under s uch a model the CVAR parameters may 



vary depending on a latent regime state variable, see 



Sueita [2008j | for details. These models are suitable in 



settings in which one believes there is fundamentally a finite set of distinct models suitable for describing 
the statistical properties of the vector price series. In such settings, typically the parameters of each 
model and the transition times for model switching are unknown and must be estimated. The model 
framework we present is distinctly different to this setting, not only do we know the deterministic times 
at which level shifts in the price series occur at the open and close of markets, but we also assume after 
accounting for these, the fundamental CVAR model parameterizes appropriately the underlying assets 
price series. 

Other possible extensions that can be made to your model are considerations of mixtures of Student-t 
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and Gaussian innovation errors for intra-day innovation noise. This would allow one to capture possible 
skew or heavy tailedness present withing the trading day in certain markets, whilst still maintaining our 
conjugacy properties. 
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